Bernoulli Rank-1 Bandits for Click Feedback
نویسندگان
چکیده
The probability that a user will click a search result depends both on its relevance and its position on the results page. The position based model explains this behavior by ascribing to every item an attraction probability, and to every position an examination probability. To be clicked, a result must be both attractive and examined. The probabilities of an item-position pair being clicked thus form the entries of a rank-1 matrix. We propose the learning problem of a Bernoulli rank-1 bandit where at each step, the learning agent chooses a pair of row and column arms, and receives the product of their Bernoulli-distributed values as a reward. This is a special case of the stochastic rank-1 bandit problem considered in recent work that proposed an elimination based algorithm Rank1Elim, and showed that Rank1Elim’s regret scales linearly with the number of rows and columns on “benign” instances. These are the instances where the minimum of the average row and column rewards μ is bounded away from zero. The issue with Rank1Elim is that it fails to be competitive with straightforward bandit strategies as μ → 0. In this paper we propose Rank1ElimKL, which replaces the crude confidence intervals of Rank1Elim with confidence intervals based on Kullback-Leibler (KL) divergences. With the help of a novel result concerning the scaling of KL divergences we prove that with this change, our algorithm will be competitive no matter the value of μ. Experiments with synthetic data confirm that on benign instances the performance of Rank1ElimKL is significantly better than that of even Rank1Elim. Similarly, experiments with models derived from real-data confirm that the improvements are significant across the board, regardless of whether the data is benign or not.
منابع مشابه
Online Learning to Rank in Stochastic Click Models
Online learning to rank is an important problem in machine learning, information retrieval, recommender systems, and display advertising. Many provably efficient algorithms have been developed for this problem recently, under specific click models. A click model is a model of how users click on a list of documents. Though these results are significant, the proposed algorithms have limited appli...
متن کاملA New Hybrid Method for Web Pages Ranking in Search Engines
There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...
متن کاملDCM Bandits: Learning to Rank with Multiple Clicks
A search engine recommends to the user a list of web pages. The user examines this list, from the first page to the last, and clicks on all attractive pages until the user is satisfied. This behavior of the user can be described by the dependent click model (DCM). We propose DCM bandits, an online learning variant of the DCM where the goal is to maximize the probability of recommending satisfac...
متن کاملN-Policy for M/G/1 Machine Repair Model with Mixed Standby Components, Degraded Failure and Bernoulli Feedback
In this paper, we study N-policy for a finite population Bernoulli feedback queueing model for machine repair problem with degraded failure. The running times of the machines between breakdowns have an exponential distribution. The repair times of the machines are independent and identically distributed random variables. If at any time a machine fails, it is sent to the repairman for repairing,...
متن کاملOnline Learning to Rank: Absolute vs. Relative
Online learning to rank holds great promise for learning personalized search result rankings. First algorithms have been proposed, namely absolute feedback approaches, based on contextual bandits learning; and relative feedback approaches, based on gradient methods and inferred preferences between complete result rankings. Both types of approaches have shown promise, but they have not previousl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017